Start position of a match

If you want to find the position of a match in an incoming string, simply check the length of $` (That's \(PREMATCH if you've chosen to use English;) to check where it starts, and add the length of \)& (that's $MATCH) to find where it ends.

Let's say I want to find all the URLs refered to in a web page that's loaded into the variable $html. I could write:

push @section,[length($`),length($&),$1]
        while ($html =~ m/(https?://[^ >"]+)/g);

and that will give me a list of 3-element lists containing start point, length and actual string matched. Here's the code to display that list:

foreach $element(@section) {
        print (join(", ",@$element),"\n");       
}

and here's some of the results from the sources of our resources index

5979, 36, http://www.wellho.net/forum/top.html

6967, 36, http://www.wellho.net/net/mouth.html

7059, 42, http://www.wellho.net/downloads/index.html

8369, 67, http://www.wellho.net/mouth/387_Training-course-plans-for-2006.html

9365, 43, http://www.trainingcenter.co.uk/travel.html

9516, 45, http://reiseauskunft.bahn.de/bin/query.exe/en

9599, 59, http://www.livedepartureboards.co.uk/ldb/summary.aspx?T=MKM

9861, 48, https://lightning.he.net/~wellho/net/secure.html

P.S. I loaded my whole web page into a single variable using the code

open (FH,"/Library/WebServer/live_html/resources/index.html");
undef $/;
$html = <FH>;

So, the whole script could be:

open FH,$ARGV[0] or die $!;
undef $/;
$html = <FH>;
 
push @section,[length($`),length($&),$1]
        while ($html =~ m!(https?://[^ >"]+)!g);
 
foreach $element(@section) {
        print (join(", ",@$element),"\n");       
}
Homepage
Comments

Hide Comments